REMARKS 

Reconsideration of this application is respectfully- 
requested in view of the foregoing amendments and following 
remarks . 

Response to Rejections Under 35 U.S.C. § 102 

The rejection of claims 1-2, 8-12, and 18-20 under 35 
U.S.C. § 102(b) as being anticipated by Sukkar (US 6292778) 
is respectfully traversed on the grounds that Sukkar patent 
does not disclose or suggest a method or system for 
utterance verification in which normalization of feature 
vectors using normalization parameters of the verification 
unit corresponding to the speech segment is utilized for 
adjusting the dynamic range of the feature vectors and 
generating a sequence of verification feature vectors for 
input to the verification-unit corresponded classifier, as 
recited in step (D) of claim 1. Instead, Sukkar discloses 
utilization of a ratio of the likelihood that the speech 
segment contains the sound associated with the subword 
hypothesis to the likelihood that the speech segment 
consists of a different sound to accordingly indicate the 
verification scores (col. 10, line 66 to col. 11, line 3), 
which is clearly different from the inventive 
normal i zat ion . 

It is respectfully noted that the normalization 
recited in step (D) of claim 1 has not yet produced 
verification scores, and thus the claimed normalization 
cannot possibly correspond to the likelihood score ratio of 
Sukkar- In addition, the verification score of each speech 
segment in the invention is derived from the sequence of 
verification feature vectors of the speech segment, whereas 
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the verification scores provided by Sukkar are determined 
as a likelihood ratio. It is clear that the calculation 
for the verification scores in the invention is different 
from that of Sukkar. 

According to the Examiner, the adjusting parameters 
and HMMs disclosed in col. 12, line 56 to col. 13, line 18, 
of the Sukkar patent inherently include means and standard 
deviation parameters. This conclusion of inherency is 
respectfully traversed. In contrast to the adjusting 
parameters of Sukkar, which are actually model parameters 
in HMM and are used to estimate the likelihood of subword 
hypothesis, the means and standard deviation recited in 
claims 2 and 12 of the present application are employed to 
normalize the feature vectors. The adjusting parameters of 
Sukkar have nothing to do with such normalization. 

Because claims 1, 2, 11, and 12 recites features that 
are different from and not anticipated by Sukkar, dependent 
claims 8-10 and 18-20 are also not anticipated by Sukkar, 
and withdrawal of the rejection under 35 USC 102(b) in view 
of the Sukkar patent is respectfully requested . 

Response to Rejections Under 35 U.S.C. § 103 

The rejection of claims 3-7 and 13-17 under 35 U.S.C. 
§ 103(a) as being unpatentable over Sukkar (US 6292778) in 
view of Carey et al . (US 5526465) is also respectfully 
traversed on the grounds that the Carey patent, like the 
Sukkar patent, fails to disclose or suggest the inventive 
feature of using a normalization to adjust the dynamic 
range of feature vectors and generate a sequence of 
verification feature vectors for input to the verification- 
unit corresponded classifier. 

As explained above, the method disclosed in the Sukkar 
patent uses the parameters to estimate the likelihood 
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associated with the subword hypothsis, but not to perform 
the normalization on the feature vectors as cited in the 
invention. Those skilled in the art will appreciate that 
the training data used for training the MLPs in the 
invention are pre-corrupted by noise with different power 
levels of SNR (for example, the speech segments corrupted 
by in-car noise with SNRs of 9dB, 3dB, OdB, -3dB, and -9dB 
are used to train the MLPs; see page 11, lines 9-17), 
whereas only a certain amount of noise is given in training 
by Sukkar. As a result, the method of Sukkar offers poorer 
performance that than the MLP training provided by the 
invention. In order to realize the advantage of the present 
invention, one can use a known method (Sukkas, R.A. , 
''Subword -based Minimum Verification Error (SB-MVE) Training 
for Task Independent Utterance Verification" Proc, 
ICASSP'98, 1998) in association with the present invention 
to receive noise-corrupted speech signal for implementing 
verification to see the difference therebetween. The result 
can be seen in the supplementary docximezit entitled ''MLP- 
BASD UTTERANCE VERIFICATION FOR IN-CAR SPEECH RECOGNITION,'' 
which is attached hereto as APPENDIX A . Briefly, the 
invention can provide good speech recognition when the 
environment is changed, but the method of Sukkar cannot - 

These deficiencies are not made up for by the Carey 
patent- In contrast to Carey, the claimed invention uses 
an MLP neural network as the classifier for changing the 
normalized feature vectors into the verification score, and 
not for increasing discrimination between the personal 
model and the world model (col. 11, lines 15-20 of Carey). 
In addition, Carey uses a Baum-Welch backword pass 
algorithm and the likelihood information in MLP training in 
order to increase discrimination between the personal model 
and the world model, whereas the claimed invention uses an 
error back-propagation algorithm and the information of 
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sequences of verification feature vectors in MLP training 
in order to generate the verification scores. Moreover, 
Carey requires two values Pp and Pw for speaker utterance 
training (col. 11, line 60 to col. 12, line 7), but the 
invention only uses the target value for speech segment 
training . 

Accordingly, the dependent claims 3-7 and 13-17 are 
not suggest by the Carey and Sukkar patents, whether 
considered individually or in any reasonable combination, 
and withdrawal of the rejection under 35 USC 103(a) is 
requested . 

CONCLUSION 

In view of the foregoing remarks, reconsideration 
and allowance of the application are now believed to be in 
order, and such action is hereby solicited. If any points 
remain in issue that the Examiner feels may be best 
resolved through a personal or telephone interview, the 
Examiner is kindly requested to contact the undersigned 
attorney at the telephone number listed below. 



Respectfully submitted, 
BACON & THOMAS, PLLC 




By: BENJAMIN E. URCIA 

Registration No. 33,805 

Date: May 18, 2007 

BACON & THOMAS, PLLC 

625 Slaters Lane, 4th Floor 

Alexandria, Virginia 22314 

Telephone: (703) 683-0500 
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Abstract 

In this psper, we present cbe method of using MuLti-Layer Pei c epttion (MLF) for inor utceiance 
vQEification. In this method, ^syllahle-based unerance verification is conducced in our woik 
for task indepejodeci consideradoa For each subsyllable verifLcaiion tioiu a verificadon^ecific 
MLP is eroployed To avoid bias ciassificarion of using the MLP, the'' design oi a proper sec of 
anti-subsyiiables to compete with the coxreci subsyllables for training is ^required, and a random 
geneiadng procedure is introduced fortius purpose. To be robusc to car aoisc-level variation, die 
noise-imnxunicy learning (Hong and dieOp 1997) is incorporated into Che oainios of MLPs. A 
Mandaidba digit-string verification task, sisaulaied for the additive ia-car noise, was conducted to 
demonstrate the effectiveness of using the MLP-based method. Experimental results show that 
the proposed method outperforms the HNIM-based one, especially when the SNRs are low. 



1 Introduction 

Hie use of autotnatLC ^ech recogniiion (ASRO in car eavironments is natural and has become one of die 
most promising applicatioos of aSR. However, it is stxU a great challenge to achieive high accuracy for 
hMar speech rceogDition. Besides, comparing with other ASR applications, it is more difScult to correct 
ASR errors and Qie cost could be high when recognition errors occur. Il is. therefore, an imponant issue 
to increase the reliability of ASR systems in car environments. Utterance verificaticn (UV) that is able to 
detect and reject recogmtion errors and noise tokens is a promising technology for this purpose. 

Many methods for UV have been proposed in the last decade (Sukkar and Lee, 1996; Pao «r a/.. 199S; 
Sukkar. 1998). bur seLdomof ihem are developed &>t noisy conidicions. To apply UV for car appUcatioos. 
the myost concerned issue is the way to hancHe the mismatch between the training and the test sets to 
avoid performance degiadanon. This cnismatch is especially serious for the wide range and time varying 
of in-car noise since it is dif&cult to design a well solution to handle all practical conditions. As an 
alternative. Hong and Chen (1 997) proposed an R>tN-based immunity learaing procedure to train the 
classification models that can be adapted to matcb/mismazch noisy .conditions for front-end 
pre-c!assification. In this learning, the corrupted training material, corrupted by background noise with 
gradually decreasing the SNR levels, is used co train the model &r obtaining the noise immuniry. They 
abo suggested using the neural networks instead of the HM^/Is for the estimation of noise statistics. Base 
on their study, the same technique is applied to UV for inrcar environmenX. more specifically, for 
reducing the effect of additive in-car noise. And the muLd-iay^ pereeptasn (^G.P) is utilized to this task. 

The MLP. which has been widely investigated in many fields, can be considered as another xcray of - 
computing a verification measure. In the study of Modi and Rahim (1997). it was applied to integrate 
multiple confidence measures for tninimizing the verification errors. In this paper, the MLF is used to 
estixcate the confidence score of a given subsyllable segment by feeding the same features used for 
recognition directly and. more impor;az:x, to make robust to car noise-level variadon. Taking this 
approach, it would be beaeficizl to incorpor^e with the speech recognition iising of the 
subsyllabie-based models for developing in-car speech applicarlons. To reach :his goal, die speech 
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segments oc $ubsyllable and anu-subsyllable corrupted by car-noise are used for ±e .VELP cx2iixiiig. The 
traiziing daia desiga conceizujas the effeeciveness of using this approach is the Ecain subject to diis work. 
The selection of craming paaems to avoid bias ciassincatioos asd the amngement of tiaioing data for 
immunicy learoing are disclosed ia this paper. 

Tad remainder of ihe paper is organized as follows, hx Sectioa 2, the arcfaitectnre of die .VIL?-based 
UV and the issue conceming about dxe selection of craining patterns are discussed. Tne izninimity 
Icaitung of MLP is presented in Seedoe 3. In Section 4, an in-car Mandarin digic-sthng venncaiion cask 
was penbnned to eval\;ate the efSectiveness of using die proposed method. A brief conclusion is given in 
die iasx section. 



2 MLP'based utterance verification 

La the study of Sukkar and Lee (1996), unexance veiificatioa is treated as die problem of siausucal 
hypodiesis testing. Where die null hypodiesis is tested against the alten^tive hypothesis and the resulted 
likelihood ratio is used to indicate the confidence score of the vehficaiioa target. Ucterarce verificadon 
also could be regarded as a cwo-class classification problem, i.e., to classify whether the verliicatioa 
target is a memba of coirect-closs or ineocreet-one. In this paper, we applied MLPs to solve this 
cwo-ciass ciassificazion problem. 

2.1 SubsyUable-based >JOLP verifier * 

We depicted die subsyllable-based venncadoa by using MLPs in Figure 1 . In this verixicacion procedure, 
die test utterance was recognized and then segmented into several subsyllable segments according the 
lexical strucmre of the recognized hypothesis. For each subsyllable segment, a subsy liable-level 
verification score is generated by a oonresponding MLP. The resulted uttecasce4evei confidence score is 
then obtained through combining the veiificatica scores of these subsyllable sq^nents. 



^pe&cnu^ct 

i 

^ ^ w 

Si;coch I 

i * r 

, SabsyQablc r. J y^jp. i 



• ^ j 

/'/«r-Sx«icaSccte ' 

Coiooiosiuni j 



Y 



Figure 1 : Utterance vcrificadon block diagram. 



The >ILP, used for the subsyllable-based ^/erificadon, is a three-layer archiicccurc wLdi singie outpur 
node for generaring :he classification score. Tae input layer accesses the acoustic leaiure vectors of 
speech frames and feeds them forward to acquire the classificadon score. This process is performed on 
each frame of subsyllable segmeoc. The subsyllable-level verificadon score is obtained when the 
feed-forward processes on subsyUable segmear axe finished. 

Tae error back-propagatioa training algorithm is used in the VILP craining. To achieve ±e goal of 
two-class discrimination, die subsyllable segments corresponding to correct-class (or correct 
subsyllable) asd inconect-one (or and-subsyUable) are used as the training natenals. In :his training 
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procedure^ the cdteriozL of miaiaxksn maa-^quare error Ls used lo asdnimize die acnioL snd che desired 
output scores and is referenced to back-adjust the free paramccers of the MLP. Toe desired output of l 
atid 0 is sec Co indicoie die venficaiioii score of the cotreci subsyilable and the ocd-subsyUabie 
respectively. 

2J2 Trainmg data design 

To train tbc subsyiiahle-based verification model by usmg ^CLP, ilie correct and iccorrec; (or and) 
subsyllables are iccixided in the cnuniag material for discrimisadon. In geaeral, tbe mcorrec: parr ^ouid 
have to repiescoc all of die correct part's complemeczs. However, die acousdc coDrusicn always can be 
found beiweea die target subsyllable aad its complements, the biased ciassiccation is inevitable if aU of 
the oomplemenis «ie used to compete widi the tar^tt subsylkble in the MLP oaimrtg. Thi^ problem is 
not only due to die acoustic confosion between subsyllz^Ies but also derived from the uabaianced 
^mmiTiT of coznpcdtots. Therefore, the quaxxdty control and a general represemadon of acti-subsyilabJe$ 
ere the key-issue to tbis problem, and die 'irandom le?cicoa", moduced by Suicicar and Lee (1996), is 
used. The random lexicon used in their work simulates the incorrect recognidoa for vedEcaxion result 
observadon. In our study, it is utilized to generace a similar motmr of ann-subsyUable patterns with 
respect co die coirea subsyllable panexns for die MLP cnuning. Tat coilecdon of die tnining pactezns is 
listed in die foUowing steps. 

Step^i Transform the g^»w»>5 cext into syll&ble-deaoted sequence and decoie it xe coireci syU&bie 
sequence, 

Step-2 Per&nn subsyllable^level segmeatadon with the correct syllable sequeoce on training speech 

and collect the subsyllable segments for the coxiect-parL 
Step-3 Duplicate the correct syllable sequence and name the duplicazion the operating sequence. 
St^-4 Randomly cut die syllable in die opeiatiiig sequence and. paste it to a new syllable sequence. 

During this operation, die syllable at the position of the tiew sequence must be licept diaerenc from 

the one with the saxne position of die coizect sequence. Tm new syllable sequence is denoted the 

random syllable sequence. 
Step-5 Perform subsyilable-level segmrTtfaHoa with die random syllable sequence on die same traiaiyag 

speech and collect the subsyllable segments for the and^art excq^t the segment with the same 

subsyllable denotation as the eoiiect one. 

Coircct syll^e ** F 97 ' (represeated by XNZTZiC /cs/ and FINAL /ai/} 




Racjdom syllable n' (lepresenied by INJTliL /t/ sai^rlNAL /ai/) 
Figure 2: The correct and random syllables. 

We depict an example in Figure 2 co cleaiiy describe she collection of anii-jubsyli^bles. iis figure, 
a random syllable ''rr^ (represented by JSilTIAL l\I and FDfAL /ai/) and the correct syllable 
(r^nesected by INITIAL /ts/ and FINAL /ai/) is located at the same position of the random and correc; 
syllable sequence, respectively. The correct {hsi and /ai/) and mcdom {N and /ai/) subsyllable segmeeis 
are obtained after performing speech segmentations with the correct and random syllable scrjcojres on 
this utterance, respectively. The subsyllable segment of /ts/ and/ai/ is collected separaieiy for die co?reci 
data of subsyllable /is/ and /ai/. For ann-subsyUable coUectionu the segment of N is -egarced as :he 
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anUHiata of JubsyUablc -t/. Howev^, the oUicr one, /ai/, is ignored since h is idfioiical eo die correc: oim 
in denotadcn and, parriaily, in acoustics. 

After perfonning segmencation wiih che random syllable secueace, the asxi-subsyilable sejsiencs are 
obtained and represeaoed by ine mo$x likely segments in xbe miining 'jxtennce, but thsy have diserect 
chaznc»isdcs atom the correct subsyllables. These aod-subsyUable segznents can be regarded as che 
inconec: recognition results and should be rejecicd by system, *\ad they are used to con©«:e *3vith dtie 
coiTcct subsyllable segmenzs in tzaining. Fuithecnort, the random sequence generation (in Step-4) can 
be also regarded as a reairangement process on the correct syllable sequence. The cotal s^^ables in the 
random syllable sequence ar? che same as in conect syllable sequence, b cotisequenc9» a similar amotmc 
of (he coircct and iacoxiect patterns can be coilected &om Step-2 and St^5. Aithous^ some incozrsct 
panems are dropped in Step-5, the abandoned quantity can be ignored with respect to the anal quantity 
of incorrect data in our implementations. Therefore, ix guaramees us against the biased classi£cations 
using of the balanced data for the MLP training. 

3 Rbbost iVILP training 

To apply speech cechnologies for in-oar applications, the capability of handling car noise variation is the 
most concssn issue. Tne isaanmity leiiGung proposed by Hong and Oien (1997) provides an altenaiive 
to tolerate the wide range of noise v^xiations. To obtain noise immunity, che speech signals corrdpied by 
backgro\and noise with several noisy conditions are used in their training. And an 5NN-based leairing ^ 
scheme is utilized :bf this metliod Tac RNN, which mounis feedbacSc ceatades :o :rassier previous 
inforcnadon for more detailed estimations, b a suitable mechanisni to model the dynamic change of 
speech patterns. Therefore, in their implementation, the speech signals connpted by different noisy 
conditions are soqueotially used to train me RjNNs. 

Based on the same leannng technique, speech signals infected with noise are also used for the MLP 
training. DtfiCertni from the RNN-based irainii^ the **clean" training data of subsyllable and 
and-subsyUable is dupUcaied into several copies for different corruptive condiuons and all of Ehem are 
used CO train the MLP az a '""^ That is, we expect die MLP naming not only to learn the discrimination 
capability of two-class but also to tolerate the efaazige of envixonmental conditions. 

4 Experiments 

Effectivoaess of thcMU-basedUV is examined by simulations on a Mandarin digit-string recognition 
task. To simulate the in-car environmenr, the data area of NTT-AT database (NTT-AT, 1996) was used 
And we chose the in«car noise of QVIC for the target scenario in this* experiment. 

4.1 Recognition models 

The ASR used for ihis task is an K^G^I-based speech recogoizcr. We emcioyed 20 subsyLlabie models, 
includii^ iO three-state ISTTIAL models and 10 five-staie FISaL oicdeis, for recognizing :he iO ^ 
Mandarin digits. Tnese KMMs were trained by y^oig che digit-Stting database of \LVr 0»Vang, 1997), 
which is designated as the "clean^ speech data. 

4JS The SttZftbases for verification model traimng 

The same digit-stnng database for training the recogoiiioa models was also used lo cr^in che veiificadon 
models. The trsining set includes 4683 utterances and 2504 speakers. Tne length of each digit-string is 
ranging &om ^ to 7 digits. The development sec* including 1 159 uctenr.ces spoken by 1159 speakers, 
was used iox seiccdng the vcrincation models with optimal performances in zhe iterative raining resuics. 
Tne contents of this pax: are all 7 digits. 

For robust training, two in-car noisy conditions of CIVIC, the in-car noise of high-speed dri^'ing 
(CrVTC-l) and low-speed driving (CIVIC-2), are used. And these two noisy conditions coordinated with 
the SNRs (in the power levels of speech and noise signals) of 9dB» 3dB, -3dB, and -9d3 were aiti^cially 
add ;o the speech signals of raiziing set for ooise-immunixy leamiag. Tae sanu zpisr/ conditions were 
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akc used :o conupi the cevelopmest set buc using ihe SNRs ox 6 dB, 3 cB, 0 cB, -3 dB. 2r.c -6 dB for xh& 
optiioal ooodel selection. 

43 Verification modeb 

Tlie feature vector used for rttcogniticn and verificadon consists of 26 coefficients, including 12 
xnel-cepstrtd coefficients, 12 delta mel-cepstcal coefficieaxs, 1 delta energy, and 1 deiia-delta energy. la 
the consonicdons of MLPs, 78 neuroos sie designed in the input layer for accessing die acoustic leaoire 
vecion of three speech dsc^es. Besides the single ouipui nraron in tht output layer for generating ±e 
classification score, 30 neurons are used in hidden layer to arbisate bettireen die input and me output of 
neural neuwork. To caiculaie die verificadon score of a speech segment, three speech &ames are fed into 
the ML? to obtain a classi&canon score for every fiame-sloi except the 5r5t axKl the last arame-slot, thai 
is, there are (1-2) classiEcaiion scores for a speech segment with T firames. And the vchncadon score of 
this speech segmenx is die mean vahie of the (T-2) classificadon scores. 

We followed the descriptions in section 2 and section 3 to design the noise-infected data snd train the 
MLPs, where 150 training iteraxv3ns were used in the training. Tne opcinsai paidmeier-set was select 
ftomdie 150 training resi^ and the equal-emr-rate (EER) of &lseiq'ectionand&lse alaim is used as 
the selection cdtenoa Finafiy, we obtained 20 MLPs corresponding to die 20 recognition HM2^l5 for the 
MLP-based verification. 

The models for the HMM-based utterance verification were also trained in ihe same experjnental 
condidons for cotsparadve smdy. The MVS training (Sukkar. 199S) was pedibnned to train the 
sobsyllable-based verificaticm tnodds« and we had 20 subsyUable HMMs and 20 ann-subsyllable 
HMMs for the HMM-based verification. 

4.4 Experimental evaluation and discoasion 

A pre-defined phone boolc, including 6S telephone numbers, is (he target for recognition and verification 
in our evaluadoiL In order to exam the verificadon performance, we describe the in^vocabulaxy (XV) and 
the out'Oi-vocabulaxy (OOV) databases used in this evaluation as follows. 

* DBl: Tne IS^ database. This database was collected through teLephooe networks. Theie are 1948 

utterances spoken by 66 1 speakei?. The digit-length is 6 or 7 in the pFe^defined phone book. 

• DB2; The first set of OOV database, which was recorded by handset microphone. This database is 

also a digh^^string database but the contents are not included in the phone book. The Ica^ of 
each digit'-stxing is tanging fiom 2 to 6 digits. This database includes 99 speakers and 504 
■ utterances. 

• DB3: Tae second set of OOV database, which was coUecred ^m our auto-actendant system (Jou 

a/., 2000) through t^ephone networks. The Chinese person-names are the coaeats of diis 
database. This database contains 3479 utterances. 

* DB4: The third set of OOV database, which contains the noise data and the spontaneous speeches, 

such as ^*wi\ "^ah", and some incomplete seotesces collected fiom our aurc-attendsn 
sysiemp We collected 1219 uttenances for this seL 

These ^ sets of databases were also corrupted by adding the CIVIC-car noise but with the SNjIs of 6 
dB, 0 gB, and -6 dB sepaiareiy to observe the robusmess of veriEcation models. 

We performed uneiance verification after digit-string rccognidon to rdeci the recognition error. Tne 
verification resists were summarized and plotted in Figure 3. Again, the EER is used as die evaluatlcn 
criterion for performance observadons. It can be observed from this Figure thai the verification 
difiBculiy is increasing as the SNRs arc decreased And the verification is more difScult for the noisy 
condition in high-speed driving (dVlC-l) than the low-speed one (CIVlC-2). It also can be understood 
from this Figure that die non-digit-string databases^ DB3 (person-names) and DB4 (jioise and 
spontaneous speech), are easier to reject than DB2 (digit-string). 

Examining the performances of these two verification miethods, the EERs of die HNIM-based 
veriScazion are ak^t in linear increasing as the SNKa are decreased from 6 dB to ^cB. Tz9 £ERs or 
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the MLP-based method are quite equal when openced ac 6 dB and 0 dB. But there is & little bit hi^ for 
the dB coodition. To be xBore clearly, we fiinher summaiized these results in Table L It caa be noted 
from this table that the perfennance of ihe MLP-based method and the HMM-based one is very similar 
fbr the 6dB case. On the other cases, the MLP-based verification preseots bener results tf^at^i the 
SMM-based one, especially for the -6dB conditioa. 

For the rejections of substimdon esrois, we listed the evaluauoxi results of di£&rem test conditions in 
Tabic 2 for the baseline system and the system incorporated with UV. The sobstituuon errois aze 
decreased largely when 3% of false rejection is seL This evaluadoix also shows the better result on the 
reduction of substitution errors by using the MLP-based veriicauoa. From these results, it can be 
concluded that the MLP-based ven£cadon is more insensitive \o noise-level variation than the HMM 
based method. This conclusion also coincides with the result of Hong and Chen (1997). 
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Figure 3: Utterance verification results for differeni -osy conditions. Wbere the IV database, DBl, was 
tested against the OQV databases of (a) DB2, (b) 033. and (c) DB4 . 
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Table 1: llie averaged verificatimperfonsaw of the -based method and the HMM-based ana for 

th eSNRs of 6 dB, Q dB. and -6 d B. 

ML? HMM 

6 0.89% 0.31% 

0 1.02% l.45ft 

■6 1.78% 13^% 



Table 2: The substitudoa ceiois for the system with and withouz usteiazice verifications under difiiBrent 
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5 Condosion 

This paperprcseated the MLP-based uttaance veriflcacion for in-car speech recogaition. hx this method, 
the subs34Iable-based vetificatioa models were used. To avoid bias result of using the MLF, a balanced 
amount of subsyUable and anti-subsyUable pattems and a general rgnftseatatlon of and-subsyllables for 
disa iminailve training arc required. Aprocedure for the coUecdozi of traznxng patcons was introduced 
for this r eqi air emcpt To .be robust to car noise-levd variation* the noise-imaiumty learning was s^iplied 
and impl«axiQnted by using tbe car-fioise-in&eted speeeh as the training noacerials of MLPs. 

From our experimental results* we found that this MLP-based method is more insoxsitxve to 
noise-levd variations than the HMM-based one. It s eems to be a workable method for in-car utterance 
veri&atian. However, the study on real in<<car environment is stsJl needed since only additive noise is 
cooosidered jn this stc^. Moreover, enoouraged by these results, scaling up this approach to large 
vocabulary recogniticHi will be also considered in our future woik. 
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